AITopics | custom dataset

Collaborating Authors

custom dataset

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

PALMS+: Modular Image-Based Floor Plan Localization Leveraging Depth Foundation Model

Cheng, Yunqian, Princen, Benjamin, Manduchi, Roberto

arXiv.org Artificial IntelligenceNov-14-2025

Indoor localization in GPS-denied environments is crucial for applications like emergency response and assistive navigation. Vision-based methods such as PALMS enable infrastructure-free localization using only a floor plan and a stationary scan, but are limited by the short range of smartphone LiDAR and ambiguity in indoor layouts. We propose PALMS$+$, a modular, image-based system that addresses these challenges by reconstructing scale-aligned 3D point clouds from posed RGB images using a foundation monocular depth estimation model (Depth Pro), followed by geometric layout matching via convolution with the floor plan. PALMS$+$ outputs a posterior over the location and orientation, usable for direct or sequential localization. Evaluated on the Structured3D and a custom campus dataset consisting of 80 observations across four large campus buildings, PALMS$+$ outperforms PALMS and F3Loc in stationary localization accuracy -- without requiring any training. Furthermore, when integrated with a particle filter for sequential localization on 33 real-world trajectories, PALMS$+$ achieved lower localization errors compared to other methods, demonstrating robustness for camera-free tracking and its potential for infrastructure-free applications. Code and data are available at https://github.com/Head-inthe-Cloud/PALMS-Plane-based-Accessible-Indoor-Localization-Using-Mobile-Smartphones

artificial intelligence, localization, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2511.09724

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Communications > Mobile (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

EVINGCA: Adaptive Graph Clustering with Evolving Neighborhood Statistics

Wiredu-Aidoo, Randolph

arXiv.org Artificial IntelligenceNov-6-2025

Abstract--Clustering algorithms often rely on restrictive assumptions: K-Means and Gaussian Mixtures presuppose convex, Gaussian-like clusters, while DBSCAN and HDBSCAN capture non-convexity but can be highly sensitive. I introduce EVINGCA (Evolving V ariance-Informed Nonparametric Graph Construction Algorithm), a density-variance based clustering algorithm that treats cluster formation as an adaptive, evolving process on a nearest-neighbor graph. EVINGCA expands rooted graphs via breadth-first search, guided by continuously updated local distance and shape statistics, replacing fixed density thresholds with local statistical feedback. With spatial indexing, EVINGCA features log-linear complexity in the average case and exhibits competitive performance against baselines across a variety of synthetic, real-world, low-d, and high-d datasets. Clustering is central to unsupervised learning, yet classical algorithms face significant structural and scalability limits. Centroid-based methods such as K-Means [19] assume convex, linearly separable clusters, while density-based approaches like DBSCAN [8] or HDBSCAN [4], [21] often struggle under heterogeneous densities and are highly sensitive in higher dimensionality. Graph-based and deep clustering methods offer stronger performance but often demand heavy tuning or incur prohibitive computational cost. I propose EVINGCA (Evolving V ariance-Informed Nonparametric Graph Construction Algorithm), an alternative clustering paradigm that models cluster formation as an adaptive, evolving process on a nearest-neighbor graph.

artificial intelligence, dataset, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2511.00064

Country: North America > United States (0.15)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

An Analytical Framework to Enhance Autonomous Vehicle Perception for Smart Cities

Khan, Jalal, Khan, Manzoor, Turaev, Sherzod, Malik, Sumbal, El-Sayed, Hesham, Ullah, Farman

arXiv.org Artificial IntelligenceOct-16-2025

The driving environment perception has a vital role for autonomous driving and nowadays has been actively explored for its realization. The research community and relevant stakeholders necessitate the development of Deep Learning (DL) models and AI-enabled solutions to enhance autonomous vehicles (AVs) for smart mobility. There is a need to develop a model that accurately perceives multiple objects on the road and predicts the driver's perception to control the car's movements. This article proposes a novel utility-based analytical model that enables perception systems of AVs to understand the driving environment. The article consists of modules: acquiring a custom dataset having distinctive objects, i.e., motorcyclists, rickshaws, etc; a DL-based model (YOLOv8s) for object detection; and a module to measure the utility of perception service from the performance values of trained model instances. The perception model is validated based on the object detection task, and its process is benchmarked by state-of-the-art deep learning models' performance metrics from the nuScense dataset. The experimental results show three best-performing YOLOv8s instances based on mAP@0.5 values, i.e., SGD-based (0.832), Adam-based (0.810), and AdamW-based (0.822). However, the AdamW-based model (i.e., car: 0.921, motorcyclist: 0.899, truck: 0.793, etc.) still outperforms the SGD-based model (i.e., car: 0.915, motorcyclist: 0.892, truck: 0.781, etc.) because it has better class-level performance values, confirmed by the proposed perception model. We validate that the proposed function is capable of finding the right perception for AVs. The results above encourage using the proposed perception model to evaluate the utility of learning models and determine the appropriate perception for AVs.

artificial intelligence, deep learning, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2510.1323

Country: Asia > Middle East > UAE (0.28)

Genre: Research Report > New Finding (0.66)

Industry:

Transportation > Ground > Road (1.00)
Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Deep Learning-Driven Multimodal Detection and Movement Analysis of Objects in Culinary

Ishat, Tahoshin Alam, Qayum, Mohammad Abdul

arXiv.org Artificial IntelligenceSep-19-2025

Abstract--This research investigates the opportunity of an intelligent, multi-modal AI system interpreting visual,audio and motion based data to analyse and comprehend cooking recipes. The system is integrated with object segmentation, hand motion classification and auido to text convertion with help of natural language processing to create a comprehensive pipeline that imitates human level understanding of kitchen tasks and recipies. The early stages of the project involved experimenting with Pre-made dataset, specially COCO dataset for object segmentation, which yielded suboptimal for use case of the project. T o overcome this, a domain-specific dataset was curated by collecting and annotating over 7,000 kitchen-related images, later augmented to 17,000 images. Several YOLOv8 segmentation models were trained on this dataset to detect 16 essential kitchen objects. Additionally, short-duration videos capturing cooking actions were collected and processed using MediaPipe to extract hand, elbow, and shoulder keypoints. These were used to train an LSTM-based model for hand action classification and incorporated Whisper, a audio-to-text transcription model and leverage a large language model such as TinyLlama to generate structured cooking recipes from the multi-modal inputs. A. Background and motivation In the era of computer vision and automation of every crucial task in our day to day life is also being infiltrated by artificial intelligence and machines.

accuracy, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.00033

Country: Asia > Bangladesh (0.14)

Genre:

Research Report (0.64)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CLIP Embeddings for AI-Generated Image Detection: A Few-Shot Study with Lightweight Classifier

Ou, Ziyang

arXiv.org Artificial IntelligenceMay-19-2025

Verifying the authenticity of AI-generated images presents a growing challenge on social media platforms these days. While vision-language models (VLMs) like CLIP outdo in multimodal representation, their capacity for AI-generated image classification is underexplored due to the absence of such labels during the pre-training process. This work investigates whether CLIP embeddings inherently contain information indicative of AI generation. A proposed pipeline extracts visual embeddings using a frozen CLIP model, feeds its embeddings to lightweight networks, and fine-tunes only the final classifier. Experiments on the public CIFAKE benchmark show the performance reaches 95% accuracy without language reasoning. Few-shot adaptation to curated custom with 20% of the data results in performance to 85%. A closed-source baseline (Gemini-2.0) has the best zero-shot accuracy yet fails on specific styles. Notably, some specific image types, such as wide-angle photographs and oil paintings, pose significant challenges to classification. These results indicate previously unexplored difficulties in classifying certain types of AI-generated images, revealing new and more specific questions in this domain that are worth further investigation.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.10664

Country:

North America > United States > New York > Monroe County > Rochester (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.47)

Industry: Information Technology > Security & Privacy (0.32)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

QuIM-RAG: Advancing Retrieval-Augmented Generation with Inverted Question Matching for Enhanced QA Performance

Saha, Binita, Saha, Utsha, Malik, Muhammad Zubair

arXiv.org Artificial IntelligenceJan-5-2025

This work presents a novel architecture for building Retrieval-Augmented Generation (RAG) systems to improve Question Answering (QA) tasks from a target corpus. Large Language Models (LLMs) have revolutionized the analyzing and generation of human-like text. These models rely on pre-trained data and lack real-time updates unless integrated with live data tools. RAG enhances LLMs by integrating online resources and databases to generate contextually appropriate responses. However, traditional RAG still encounters challenges like information dilution and hallucinations when handling vast amounts of data. Our approach addresses these challenges by converting corpora into a domain-specific dataset and RAG architecture is constructed to generate responses from the target document. We introduce QuIM-RAG (Question-to-question Inverted Index Matching), a novel approach for the retrieval mechanism in our system. This strategy generates potential questions from document chunks and matches these with user queries to identify the most relevant text chunks for generating accurate answers. We have implemented our RAG system on top of the open-source Meta-LLaMA3-8B-instruct model by Meta Inc. that is available on Hugging Face. We constructed a custom corpus of 500+ pages from a high-traffic website accessed thousands of times daily for answering complex questions, along with manually prepared ground truth QA for evaluation. We compared our approach with traditional RAG models using BERT-Score and RAGAS, state-of-the-art metrics for evaluating LLM applications. Our evaluation demonstrates that our approach outperforms traditional RAG architectures on both metrics.

information, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ACCESS.2024.3513155

2501.02702

Country: North America > United States > North Dakota (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Graph-Driven Models for Gas Mixture Identification and Concentration Estimation on Heterogeneous Sensor Array Signals

Wang, Ding, Wang, Lei, Yin, Huilin, Gu, Guoqing, Lin, Zhiping, Zhang, Wenwen

arXiv.org Artificial IntelligenceDec-18-2024

Accurately identifying gas mixtures and estimating their concentrations are crucial across various industrial applications using gas sensor arrays. However, existing models face challenges in generalizing across heterogeneous datasets, which limits their scalability and practical applicability. To address this problem, this study develops two novel deep-learning models that integrate temporal graph structures for enhanced performance: a Graph-Enhanced Capsule Network (GraphCapsNet) employing dynamic routing for gas mixture classification and a Graph-Enhanced Attention Network (GraphANet) leveraging self-attention for concentration estimation. Both models were validated on datasets from the University of California, Irvine (UCI) Machine Learning Repository and a custom dataset, demonstrating superior performance in gas mixture identification and concentration estimation compared to recent models. In classification tasks, GraphCapsNet achieved over 98.00% accuracy across multiple datasets, while in concentration estimation, GraphANet attained an R2 score exceeding 0.96 across various gas components. Both GraphCapsNet and GraphANet exhibited significantly higher accuracy and stability, positioning them as promising solutions for scalable gas analysis in industrial settings.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2412.13891

Country: North America > United States > California > Orange County > Irvine (0.24)

Genre: Research Report > New Finding (0.46)

Industry: Energy > Oil & Gas (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PyPulse: A Python Library for Biosignal Imputation

Gao, Kevin, Xu, Maxwell A., Rehg, James M., Moreno, Alexander

arXiv.org Artificial IntelligenceDec-9-2024

We introduce PyPulse, a Python package for imputation of biosignals in both clinical and wearable sensor settings. Missingness is commonplace in these settings and can arise from multiple causes, such as insecure sensor attachment or data transmission loss. PyPulse's framework provides a modular and extendable framework with high ease-of-use for a broad userbase, including non-machine-learning bioresearchers. Specifically, its new capabilities include using pre-trained imputation methods out-of-the-box on custom datasets, running the full workflow of training or testing a baseline method with a single line of code, and comparing baseline methods in an interactive visualization tool. We released PyPulse under the MIT License on Github and PyPI.

artificial intelligence, imputation, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2412.06382

Country: North America > United States > Illinois > Champaign County > Urbana (0.05)

Genre: Research Report (0.65)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Using Images to Find Context-Independent Word Representations in Vector Space

Kumar, Harsh

arXiv.org Artificial IntelligenceNov-28-2024

Many methods have been proposed to find vector representation for words, but most rely on capturing context from the text to find semantic relationships between these vectors. We propose a novel method of using dictionary meanings and image depictions to find word vectors independent of any context. We use auto-encoder on the word images to find meaningful representations and use them to calculate the word vectors. We finally evaluate our method on word similarity, concept categorization and outlier detection tasks. Our method performs comparably to context-based methods while taking much less training time.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2412.03592

Country:

Europe > Bulgaria > Sofia City Province > Sofia (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)
(7 more...)

Genre: Research Report (0.71)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.43)

Add feedback

Whisper Finetuning on Nepali Language

Rijal, Sanjay, Adhikari, Shital, Dahal, Manish, Awale, Manish, Ojha, Vaghawan

arXiv.org Artificial IntelligenceNov-19-2024

Despite the growing advancements in Automatic Speech Recognition (ASR) models, the development of robust models for underrepresented languages, such as Nepali, remains a challenge. This research focuses on making an exhaustive and generalized dataset followed by fine-tuning OpenAI's Whisper models of different sizes to improve transcription (speech-to-text) accuracy for the Nepali language. We leverage publicly available ASR datasets and self-recorded custom datasets with a diverse range of accents, dialects, and speaking styles further enriched through augmentation. Our experimental results demonstrate that fine-tuning Whisper models on our curated custom dataset substantially reduces the Word Error Rate (WER) across all model sizes attributed to larger data variations in terms of speaker's age, gender, and sentiment, acoustic environment, dialect, denser audio segments (15-30 seconds) that are more compatible with Whisper's input, and manual curation of audios and transcriptions. Notably, our approach outperforms Whisper's baseline models trained on Fleur's dataset, achieving WER reductions of up to 36.2% on the small and 23.8% on medium models. Furthermore, we show that data augmentation plays a significant role in enhancing model robustness. Our approach underlines the importance of dataset quality, variation, and augmentation in the adaptation of state-of-the-art models to underrepresented languages for developing accurate ASR systems.

artificial intelligence, deep learning, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2411.12587

Country:

Asia > Pakistan (0.04)
Asia > Nepal (0.04)
Asia > India (0.04)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback